智能论文笔记

Federated Self-supervised Learning for Video Understanding

Yasar Abbas Ur Rehman , Yan Gao , Jiajun Shen , Pedro Porto Buarque de Gusmao , Nicholas Lane

分类：计算机视觉

2022-07-05

启用摄像头的移动设备的无处不在导致在边缘生产大量未标记的视频数据。尽管已经提出了各种自我监督学习（SSL）方法来收集其潜在的时空表征，以进行特定于任务的培训，但实际挑战包括隐私问题和沟通成本，可以阻止SSL在大规模上部署。为了减轻这些问题，我们建议将联合学习（FL）用于视频SSL的任务。在这项工作中，我们评估了当前最新ART（SOTA）视频-SSL技术的性能，并确定其在与Kinetics-400数据集模拟的大规模FL设置中集成到大规模的FL设置时的缺陷。我们遵循，为视频（称为FedVSSL）提出了一个新颖的Federated SSL框架，该框架集成了不同的聚合策略和部分重量更新。广泛的实验证明了FEDVSSL的有效性和意义，因为它在UCF-101上优于下游检索任务的集中式SOTA，而HMDB-51的效率为6.66％。

translated by 谷歌翻译

A Secure Healthcare 5.0 System Based on Blockchain Technology Entangled with Federated Learning Technique

Abdur Rehman , Sagheer Abbas , M. A. Khan , Taher M. Ghazal , Khan Muhammad Adnan , Amir Mosavi

分类：机器学习

2022-09-16

近年来，全球医学事物（IOMT）行业已经以极大的速度发展。由于IOMT网络的庞大规模和部署，安全和隐私是IOMT的关键问题。机器学习（ML）和区块链（BC）技术已大大提高了Healthcare 5.0的功能和设施，并产生了一个名为“ Smart Healthcare”的新领域。通过早期确定问题，智能医疗保健系统可以帮助避免长期损害。这将提高患者的生活质量，同时减少压力和医疗保健费用。 IOMT在信息技术领域中启用了一系列功能，其中之一是智能和互动的医疗保健。但是，将医疗数据合并到单个存储位置以训练强大的机器学习模型，这引起了人们对隐私，所有权和更加集中的遵守的担忧。联合学习（FL）通过利用集中式聚合服务器来传播全球学习模型，从而克服了前面的困难。同时，本地参与者可以控制患者信息，从而确保数据机密性和安全性。本文对与医疗保健中联邦学习纠缠的区块链技术的发现进行了全面分析。 5.0。这项研究的目的是利用区块链技术和入侵检测系统（IDS）在医疗保健5.0中构建安全的健康监测系统，以检测医疗保健网络中的任何恶意活动，并使医生能够通过医疗传感器监控患者并采取必要的措施。定期通过预测疾病。

translated by 谷歌翻译

Implicit Equivariance in Convolutional Networks

Naman Khetan , Tushar Arora , Samee Ur Rehman , Deepak K. Gupta

分类：计算机视觉

2021-11-28

卷积神经网络（CNN）在翻译下是固有的等分反，但是，它们没有等效的嵌入机制来处理其他变换，例如旋转和规模变化。存在几种方法，使CNN通过设计在其他转换组下变得等效。其中，可操纵的CNN特别有效。然而，这些方法需要将滤波器重新设计标准网络，筛选涉及复杂的分析功能的预定义基的组合。我们通过实验证明，在选择的基础上的这些限制可能导致模型权重，这对主要深度学习任务进行了次优（例如，分类）。此外，这种硬烘焙的显式配方使得难以设计包括异质特征组的复合网络。为了规避此类问题，我们提出了隐含的等级网络（IEN），其通过优化与标准损耗术语相结合的多目标损耗函数来诱导标准CNN模型的不同层的等级。通过在ROT-MNIST上的VGG和RESNET模型的实验，ROT-TINIMAGENET，SCALE-MNIST和STL-10数据集上，我们表明IEN，即使是简单的配方，也要优于可操纵网络。此外，IEN促进了非均相过滤器组的构建，允许CNNS中的通道数量减少超过30％，同时保持与基线的表现。 IEN的功效进一步验证了视觉对象跟踪的难题。我们表明IEN优于最先进的旋转等级跟踪方法，同时提供更快的推理速度。

translated by 谷歌翻译

The CORSMAL benchmark for the prediction of the properties of containers

Alessio Xompero , Santiago Donaher , Vladimir Iashin , Francesca Palermo , Gökhan Solak , Claudio Coppola , Reina Ishikawa , Yuichi Nagao , Ryo Hachiuma , Qi Liu

分类：计算机视觉

2021-07-27

声学和视觉感测可以在人操纵时支持容器重量和其内容量的非接触式估计。但是，Opaquent和透明度（包括容器和内容的透明度）以及材料，形状和尺寸的可变性都会使这个问题具有挑战性。在本文中，我们向基准方法提出了一个开放框架，用于估计容器的容量，以及其内容的类型，质量和量。该框架包括数据集，明确定义的任务和性能测量，基线和最先进的方法，以及对这些方法的深入比较分析。使用单独的音频或音频和视觉数据的组合使用具有音频的神经网络的深度学习，用于分类内容的类型和数量，无论是独立的还是共同。具有视觉数据的回归和几何方法是优选的，以确定容器的容量。结果表明，使用仅使用Audio作为输入模块的方法对内容类型和级别进行分类，可分别获得加权平均F1-得分高达81％和97％。估计仅具有视觉视觉的近似接近和填充质量的容器容量，具有视听，多级算法达到65％的加权平均容量和质量分数。

translated by 谷歌翻译

Hand-breathe: Non-Contact Monitoring of Breathing Abnormalities from Hand Palm

Kawish Pervez , Waqas Aman , M. Mahboob Ur Rahman , M. Wasim Nawaz , Qammer H. Abbasi

分类：机器学习

2022-12-12

In post-covid19 world, radio frequency (RF)-based non-contact methods, e.g., software-defined radios (SDR)-based methods have emerged as promising candidates for intelligent remote sensing of human vitals, and could help in containment of contagious viruses like covid19. To this end, this work utilizes the universal software radio peripherals (USRP)-based SDRs along with classical machine learning (ML) methods to design a non-contact method to monitor different breathing abnormalities. Under our proposed method, a subject rests his/her hand on a table in between the transmit and receive antennas, while an orthogonal frequency division multiplexing (OFDM) signal passes through the hand. Subsequently, the receiver extracts the channel frequency response (basically, fine-grained wireless channel state information), and feeds it to various ML algorithms which eventually classify between different breathing abnormalities. Among all classifiers, linear SVM classifier resulted in a maximum accuracy of 88.1\%. To train the ML classifiers in a supervised manner, data was collected by doing real-time experiments on 4 subjects in a lab environment. For label generation purpose, the breathing of the subjects was classified into three classes: normal, fast, and slow breathing. Furthermore, in addition to our proposed method (where only a hand is exposed to RF signals), we also implemented and tested the state-of-the-art method (where full chest is exposed to RF radiation). The performance comparison of the two methods reveals a trade-off, i.e., the accuracy of our proposed method is slightly inferior but our method results in minimal body exposure to RF radiation, compared to the benchmark method.

translated by 谷歌翻译

Single image calibration using knowledge distillation approaches

Khadidja Ould Amer , Oussama Hadjerci , Mohamed Abbas Hedjazi , Antoine Letienne

分类：计算机视觉

2022-12-05

Although recent deep learning-based calibration methods can predict extrinsic and intrinsic camera parameters from a single image, their generalization remains limited by the number and distribution of training data samples. The huge computational and space requirement prevents convolutional neural networks (CNNs) from being implemented in resource-constrained environments. This challenge motivated us to learn a CNN gradually, by training new data while maintaining performance on previously learned data. Our approach builds upon a CNN architecture to automatically estimate camera parameters (focal length, pitch, and roll) using different incremental learning strategies to preserve knowledge when updating the network for new data distributions. Precisely, we adapt four common incremental learning, namely: LwF , iCaRL, LU CIR, and BiC by modifying their loss functions to our regression problem. We evaluate on two datasets containing 299008 indoor and outdoor images. Experiment results were significant and indicated which method was better for the camera calibration estimation.

translated by 谷歌翻译

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

Giulia Vezzani , Dhruva Tirumala , Markus Wulfmeier , Dushyant Rao , Abbas Abdolmaleki , Ben Moran , Tuomas Haarnoja , Jan Humplik , Roland Hafner , Michael Neunert

分类：机器学习 | 人工智能 | 机器人

2022-11-24

The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.

translated by 谷歌翻译

In-memory factorization of holographic perceptual representations

Jovin Langenegger , Geethan Karunaratne , Michael Hersche , Luca Benini , Abu Sebastian , Abbas Rahimi

分类：计算机视觉 | 机器学习 | 神经与进化计算

2022-11-09

Disentanglement of constituent factors of a sensory signal is central to perception and cognition and hence is a critical task for future artificial intelligence systems. In this paper, we present a compute engine capable of efficiently factorizing holographic perceptual representations by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing and the intrinsic stochasticity associated with analog in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least five orders of magnitude larger problems that cannot be solved otherwise, while also significantly lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix-vector multiply operations are executed at O(1) thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to factorize visual perceptual representations reliably and efficiently.

translated by 谷歌翻译

MEDS-Net: Self-Distilled Multi-Encoders Network with Bi-Direction Maximum Intensity projections for Lung Nodule Detection

Muhammad Usman , Azka Rehman , Abdullah Shahid , Siddique Latif , Shi Sub Byon , Byoung Dai Lee , Sung Hyun Kim , Byung il Lee , Yeong Gil Shin

分类：计算机视觉

2022-10-30

In this study, we propose a lung nodule detection scheme which fully incorporates the clinic workflow of radiologists. Particularly, we exploit Bi-Directional Maximum intensity projection (MIP) images of various thicknesses (i.e., 3, 5 and 10mm) along with a 3D patch of CT scan, consisting of 10 adjacent slices to feed into self-distillation-based Multi-Encoders Network (MEDS-Net). The proposed architecture first condenses 3D patch input to three channels by using a dense block which consists of dense units which effectively examine the nodule presence from 2D axial slices. This condensed information, along with the forward and backward MIP images, is fed to three different encoders to learn the most meaningful representation, which is forwarded into the decoded block at various levels. At the decoder block, we employ a self-distillation mechanism by connecting the distillation block, which contains five lung nodule detectors. It helps to expedite the convergence and improves the learning ability of the proposed architecture. Finally, the proposed scheme reduces the false positives by complementing the main detector with auxiliary detectors. The proposed scheme has been rigorously evaluated on 888 scans of LUNA16 dataset and obtained a CPM score of 93.6\%. The results demonstrate that incorporating of bi-direction MIP images enables MEDS-Net to effectively distinguish nodules from surroundings which help to achieve the sensitivity of 91.5% and 92.8% with false positives rate of 0.25 and 0.5 per scan, respectively.

translated by 谷歌翻译

Offensive Language Detection on Twitter

Nikhil Chilwant , Syed Taqi Abbas Rizvi , Hassan Soliman

分类：自然语言处理 | 机器学习

2022-09-28

在社交媒体中发现进攻性语言是社交媒体面临的主要挑战之一。研究人员提出了许多高级方法来完成这项任务。在本报告中，我们尝试利用他们的方法中的学习，并结合我们的想法以改进它们。我们在对进攻推文分类中成功实现了74％的准确性。我们还列出了社交媒体界的滥用内容检测中的即将到来的挑战。

translated by 谷歌翻译